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Abstract 



The periodic transference of nucleotide strings in bacterial and 
archaeal complete genomes is investigated by using the metric rep- 
resentation and the recurrence plot method. The generated periodic 
correlation structures exhibit four kinds of fundamental transferring 
characteristics: a single increasing period, several increasing periods, 
an increasing quasi-period and almost noincreasing period. The mech- 
anism of the periodic transference is further analyzed by determining 
all long periodic nucleotide strings in the bacterial and archaeal com- 
plete genomes and is explained as follows: both the repetition of basic 
periodic nucleotide strings and the transference of non-periodic nu- 
cleotide strings would form the periodic correlation structures with 
approximately the same increasing periods. 

Keywords Bacterial and archaeal complete genomes, Periodic 
correlation structures, Metric representation. Recurrence plots 
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1 Introduction 



Since complete genomes of many organisms are available from web-based 
databases, a full and systematic search of genome structures, functions and 
dynamics becomes an essential part of the study for both biologists and 
physicists. For the large amount of genomes, developing quantitative meth- 
ods to extract meaningful information is a major challenge with respect to 
applications of statistical mechanics and nonlinear dynamics to biological 
systems[Tl Ej. To understand the complete genomes, some statistical and 
geometrical methods were developed[3l Ill|5l[6l[71|8l[9l[l0l[lll[l2l[l3l[lil[l5l 
[T6| [T7t [T8] . The studies of the complete genomes of many organisms came 
up with the determinations of the nontrivial statistical characteristics, such 
as the long-range correlations, the short-range correlations and the fractal 
features or genomic signatures. In particular, it was found that the trans- 
posable elements, as the mobile DNA sequences, have the ability to move 
from one place to another and make many replicas within the genome via 
the transposition [ini EHl El]. Their origin, evolution, and tremendous effects 
on the genome structure and the gene function are issues of fundamental 
importance in biology[221 1231 1^ . 

In general, the symbolic dynamics and the recurrence plots are basic 
methods of nonlinear dynamics for analyzing complex systems [25| 126] . Al- 
though the conventional methods have made great strides in understanding 
genetic patterns, they are required to analyze the so-called junk DNA with 
complex functions governing mutations [271 [28]. Recently, a one-to-one met- 
ric representation of a genome borrowed from the symbolic dynamics was 
proposed to form a fractal pattern in a plane [2^ [5U] . By using the metric 
representation method, the recurrence plot technique of the genome was es- 
tablished to analyze the correlation structures of nucleotide strings [3T| [32]. 
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The transference of nucleotide strings appears at many positions of a com- 
plete genome and makes a regular and irregular correlation structures, but 
the periodic correlation structures in the complete genome are the most inter- 
esting in view of the dynamics. In this paper, using the metric representation 
and the recurrence plot method, we identify periodic correlation structures in 
bacterial and archaeal complete genomes and analyze the mechanism of the 
periodic correlation structures. Since the nucleotide strings include trans- 
posable elements, the mechanism is conducible to understanding the genome 
structures in terms of nucleotide strings transferring in the genomes and 
exploring relations between transference of nucleotide strings and the trans- 
posable elements. 

2 Correlation structures in periodic and ran- 
dom sequences 

In what follows, we give a brief presentation of the metric representation 
and the recurrence plot method, which are detailed in [29], |30l |3T1 [32]. For 
a given symbolic sequence S1S2 ■ ■ ■ Si - ■ ■ sjy (sj G {A, C, G, T}), a metric rep- 
resentation for its subsequences = S1S2 ■ ■ ■ (1 < A; < A^) is defined 
as 



where /ij is if Sj G {A,C} or 1 if Si G {G,T} and z/j is if Sj G {A,T} 
or 1 if G {C, G}. It maps the one-dimensional symbolic sequence to the 
two-dimensional plane The subsequences with the same ending l- 

nucleotide string are labeled with S'. They correspond to points in the zone 
encoded by the /-nucleotide string. With two subsequences Sj G S' and 



«fc = 2 Ej=i /Ufc-.+iS-^' + 3-^= = 2 Eti /i.3-('=-^+i) + 3-^ 
= 2 i^k-j+i^-' +3-'^ = 2 Eti z/.3-('=-''+i) + 3-^ 
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U > I), calculate 

e{ei - |E, - E,-|) = e{ei - ^(a, - q;,)^ + - /3,)2), (2) 

where ej = 1/3' and © is the Heaviside function [Q{x) = 1, if x > 0; Q{x) — 0, 
if X < 0]. When ©(ej — jEj — Ey|) = 1, i.e., E^ e E', a point is plotted 
on a plane. Repeating the above process for i e [l,N] and j G [/,A^], we 
obtain a recurrence plot of the symbolic sequence. To present the correlation 
structure in the recurrence plot plane, we define a correlation intensity at a 
given correlation distance d as 

N-d 

E{d)^ $:0(q-|E,-E,+,|), (3) 

i=l 

which displays the transference of ^-nucleotide strings in the symbolic se- 
quence. On the recurrent plot plane, since Ej and E^ e E', the transferring 
element has a length I at least. We calculate the maximal value of x to satisfy 

~ \^i+x ~ ^j+x\) = X — 0,1,2, ■ • -Xjnaxj (4) 

i.e., Ej+-r and T^j+x £ E'. The transferring element has a length L — l + Xmax 
and is placed at the positions {i — I + l,i + Xmax) and {j — I + 1, j + Xmax), 
which implies the correction distance d = j — i. 

To understand the transferring characteristics of a complex genome, we 
investigate the correlation structures of simple periodic and random sequences. 
By randomly combining the four letters A, C, G and T, we firstly generate 
two random nucleotide sequences: one has the length of 67 and another 
has the length of 5000. Then, a periodic nucleotide sequence with the total 
length of 5000 is formed by repeating the short nucleotide string. Using the 
metric representation and the recurrence plot method, we may determine 
the correlation intensities at different correlation distances with Z = 8 for 
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the periodic and random sequences in Fig. 1. It is evident that there exist 
equidistant parallel lines with a basic correlation distance in Fig. 1(a), to form 
the periodic correlation structure for the periodic sequence. The basic corre- 
lation distance hereinafter called the basic periodic length is determined as 
df, = 67. The correlation intensity decreases linearly with the increase of 
the correlation distance [db, 2db, ■ ■ ■). However, in Fig. 1(b), the correlation 
intensity H((i) is very small, so there are almost no correlation structures for 
the random sequence. Therefore, the periodic and random sequences exhibit 
two very different transferring characteristics: with the periodic correlation 
structure with a linearly decreasing intensity and without a clear correlation 
structure. 

3 Periodic nucleotide strings in bacterial and 
archaeal complete genomes 

At the end of 1999, complete genomes including more of 20 bacteria were 
in the Genbank[7]. By using the string composition and the metric repre- 
sentation method, the suppressions of all short strings in 23 bacterial and 
archaeal complete genomes were determined [3 [30]. In this section, using 
the metric representation and the recurrence plot method, we determine all 
long periodic nucleotide strings (> 20 bases) in the 23 bacterial and archaeal 
genomes. For the 23 genomes, only 13 have long periodic nucleotide strings. 
All basic strings and their lengths of the long periodic nucleotide strings 
in the 13 bacterial and archaeal genomes are presented in Table I in the 
order of decreasing suppressions of nucleotide strings [30j. Several periods 
and different basic strings can be seen depending on the genomes, but not 
necessarily on the lengths of genomes. The genomes of Helicobacter pylori 
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26695 {hpyl), Helicobacter pylori J99 {hpyl99), Haemophilus influenzae Rd 
KW20 {hinf), Mycobacterium tuberculosis H37Rv {mtub), Synechocystis 
sp. PCC6803 (synecho) have more periods (> 6) and basic strings (> 9) 
than others, which have only fewer periods (< 3) and basic strings (< 4). 
In each period, the number of the basic strings generally depends on the 
length of the period. The longer/shorter period the basic strings have, the 
smaller/greater their number will be. In the next section, we will investigate 
the periodic transference of nucleotide strings in the bacterial and archaeal 
complete genoms and analyze the effects of periodic nucleotide strings on the 
correlation structures. 

4 Periodic correlation structures in bacterial 
and archaeal complete genomes 

The periodic correlation structures of a complete genome contain several ba- 
sic periodic and/or quasi-periodic lengths, which are determined by using the 
metric representation and the recurrence plot method as follows. From the 
relationship between the correlation intensity and the correlation distance 
obtained by using Eq. (3), the basic periodic lengths and their integer mul- 
tiples with strong correlation intensities can be calculated. Moreover, in the 
transference of nucleotide strings obtained by using Eq. (4), the correlation 
distance with basic periodic lengths and their integer multiples can also be 
found. By using both methods, the basic periodic lengths of the periodic 
correlation structures are determined, as shown in Table II, where the 23 
complete genomes with official genbank accession numbers are arranged in 
the order of decreasing suppressions of nucleotide strings |30]. When the peri- 
odic correlation structures have only a few peaks of the correlation intensity 
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within the correlation distance, the basic periodic lengths are put in paren- 
theses. To see the characteristics of the periodic correlation structures, we 
also present all basic string lengths in long periodic nucleotide strings (> 20 
bases) in Table II. When a periodic correlation structure is identified based 
on a long periodic nucleotide string, the transference of nucleotide strings 
composed of the basic strings appears at some positions where the correla- 
tion distance is integer multiples of the period and monotonically increases. 
At the same time, the lengths of transferred nucleotide strings monotonically 
decrease. There exists a " cascade" arrangement of nucleotide strings related 
to the basic periodic length. However, when a periodic correlation structure 
is identified based on non-periodic nucleotide strings, the transference of nu- 
cleotide strings appears at several positions where the correlation distance 
is almost integer multiples of the basic periodic length. There are no "cas- 
cade" arrangements of nucleotide strings related to the basic periodic length. 
According to the characteristics of the periodic correlation structures, the 
results can be summarized as follows: 

(l)The correlation distance contains a single increasing period. The most 
of the complete genomes with a single increasing period have a basic periodic 
length of 67. They include Methanococcus jannaschii DSM 2661 (mjan), 
Methanobacterium thermoautotrophicum str. delta H (mthe), Pyrococcus 
horikoshii 0T3 (pyro), Archaeoglobus fulgidus DSM 4304 (aful), Pyrococcus 
abyssi (pabyssi) and Thermotoga maritima MSB8 {tmar) genomes. Consider 
the mjan genome as an example. Fig. 2 displays the correlation intensity 
at different correlation distances with I — 15 for the mjan genome. It is 
evident that there exist some equidistant parallel lines with a basic periodic 
length, to form a periodic correlation structure. The basic periodic length is 
determined as = 67. Generally, if the genome has a periodic nucleotide 
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string with the basic string length p — df,, it would tend to form a periodic 
correlation structure. In Table II, the mjan genome has the correspondent 
basic string length p = db for periodic nucleotide strings. For example, 
the nucleotide string Ei = at'^ ■ ■ ■ aH (237122-237620) with Ls^ = 499 is 
formed by repeating the basic string at^ • ■ - t^c with the length p — d^, where 
-t'Si — 7p + 30. In other words, the basic string is duplicated to the positions 
with the correlation distances p, 2p, 3p, Ap, 5p, 6p and 7p. Despite possible 
contribution from such periodic nucleotide strings, the periodic correlation 
structure is mainly formed by the transference of non-periodic nucleotide 
strings, which has approximately the same increasing period. For example, 
the nucleotide string E2 = aH'^aHcagac^gt^cg'^aHg^a^ (447-476) with = 
30 is transferred to the places (514-543), (581-610), (651-680), (718-747), 
(785-814), (855-884), (922-951), (994-1023), (1064-1093), (1132-1161) and 
• • • with the correlation distances d^, 2^6, 3^6 -|- 3, 4^6 -|- 3, hdi, -\- 3, 6^6 -|- 
6, Idb -\- 6, 8^6 -I- 11, 9(^6 -I- 14, lOdb -I- 15 and • • •, respectively. Since the 
nucleotide string E2 is neither periodic nor a part of a periodic nucleotide 
string, its periodic transference is not a repetition of basic periodic nucleotide 
strings. Moreover, Fig. 2 shows that there also exists a cluster of basic 
periodic lengths close to d^. Their integer multiples are distributed near the 
periodic correlation structure. Table II shows that there also exists another 
basic string length p = 68 for periodic nucleotide strings, which is conducible 
to form the cluster distribution near the periodic correlation structure. So 
both the repetition of basic periodic nucleotide strings and the transference of 
non-periodic nucleotide strings would form the periodic correlation structure 
with approximately the same increasing period. 

Besides the mjan genome, the other genomes {mthe, pyro, aful, pabyssi 
and tmar) have no periodic nucleotide strings with the basic string length 
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p — db to make contributions to the periodic correlation structure. So the 
periodic correlation structure is formed by the transference of non-periodic 
nucleotide strings. Furthermore, the genomes of Mycoplasma genitalium G37 
{mgen), hinf, Mycoplasma pneumoniae M129 {mpneu), Treponema pallidum 
subsp. pallidum str. nichols (tpat), Aeropyrum pernix Kl (aero), Rickettsia 
prowazekii str. madrid E (rpxx) and Borrelia burgdorferi B31 (bbur) have 
basic periodic lengths df, = 3, 4, 12, 24, 65, 84 and 162, respectively. In Table 
II, they correspond to periodic nucleotide strings with the basic length p = d^ 
except the aero genome. So both the repetition of basic periodic nucleotide 
strings and the transference of non-periodic nucleotide strings would form 
the periodic correlation structure with approximately the same increasing 
period. 

(2) The correlation distance contains several increasing periods. The 
Escherichia coli K-12 MG1655 (ecoli) genome has two basic periodic lengths 
100 and 113. The hpyl99 genome has three basic periodic lengths 8, 15 and 
21. The mtuh genome has three basic periodic lengths 9, 15 and 57. Consider 
the hpyl99 genome as an example. Fig. 3 displays the correlation intensity 
at different correlation distances with I — 15 for the hpyl99 genome. It is 
evident that there exist some equidistant parallel lines with basic periodic 
lengths, to form periodic correlation structures. Three basic periodic lengths 
are determined as db^ = 8, db^ = 15 and db^ = 21. Although there are 
some peaks of the correlation intensity in the correlation distance as shown 
in Fig. 3, they do not form any periodic correlation structures and are not 
accounted. Table II also shows some periodic nucleotide strings with basic 
string lengths pi — db^, P2 — db2, Ps — db^ and their integer multiples, 
which contribute to the periodic correlation structures. For example, the 
nucleotide string Si = ca^ ■ ■ ■ ca? (1061079-1061153) with L^j = 75 is formed 
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by repeating the basic string ca?(?at^ with the length pi, where L^j = 9pi + 
3. The nucleotide string E2 = a? ■ ■ ■ ca^ (5153-5280) with L^^ = 128 is 
formed by repeating the basic string a^ca^go?t^ with the length p2, where 
= 8p2 + 8. The nucleotide string E3 = ate •• • tea (659300-659450) with 
= 151 is formed by repeating the basic string atcata^t^a^cHcaHc with 
the length ps, where = Tps+A. Although the transference of non-periodic 
nucleotide strings might also contribute to the periodic correlation structures, 
they are mainly formed by repeating the basic periodic nucleotide strings. For 
example, the non-periodic nucleotide string E4 = aga^cHa?cta?ga^c (59514- 
59537) with L^^ = 24 is transferred to the places (59640-59663) and (59724- 
59747) with the correlation distances &db^ and lOrf^g, respectively. So both the 
repetition of basic periodic nucleotide strings and the transference of non- 
periodic nucleotide strings would form the periodic correlation structures 
with approximately the same increasing periods. 

(3) The correlation distance has an increasing quasi-period. The Bacillus 
subtilis subsp. subtilis str. 168 (bsub) genome has a basic quasi-periodic 
length of 5000. Fig. 4 shows the correlation intensity at different corre- 
lation distances with I — 15 for the bsub genome. It is evident that there 
exist some approximately equidistant parallel lines at the positions d — 4996, 
10605, 15427 and 20468, to form a quasi-periodic correlation structure with 
a basic quasi-periodic length db ~ 5000. Although a stronger correlation 
intensity appears at the position d = 5856, it is far away from the quasi- 
periodic correlation structure and is not accounted. In Table II, there are 
no periodic nucleotide strings with the length p — db to make a contribution 
to the quasi-periodic correlation structure. For example, the non-periodic 
nucleotide string Si = age - ■ -tac (167978-169382) with L^^ = 1405 is trans- 
ferred to the place (172974-174378) with the correlation distance 4996. The 
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non-periodic nucleotide string E2 = • • • ci^ (161449-161666) with L^^ = 218 
is transferred to the places (167057-167274), (172056-172273) and (946761- 
946798) with the correlation distances 5608, 10607 and 785321, respectively. 
So the transference of non-periodic nucleotide strings would form the quasi- 
periodic correlation structure. 

(4) The correlation distance contains a combination of several increasing 
periods and an increasing quasi-period. Firstly, the hpyl genome has two 
basic periodic lengths 7, 8 and a basic quasi-periodic length of 114. Fig. 5(a) 
shows the correlation intensity at different correlation distances with I — 15 
for the hpyl genome, with a local region magnified. It is evident that there 
exist some equidistant parallel lines with basic periodic lengths, to form pe- 
riodic correlation structures in a short range of the correlation distance. The 
two basic periodic lengths are determined as d^^ = 7 and di,^ = 8. More- 
over, in Fig. 5(a), there also exist some approximately equidistant parallel 
lines at the positions d = 96, 207, 324, 438, 552, 666 and 780, to form a 
quasi-periodic correlation structure in a long range of the correlation dis- 
tance. The quasi-periodic correlation distance is described as (i ~ 96 + xdb^, 
where the basic quasi-periodic length di,^ is 114 and x = 0, 1, 2, • • •. In Ta- 
ble II, there exist some periodic nucleotide strings with basic string lengths 
Pi = c?6i, P2 — db^ and their integer multiples, but no periodic nucleotide 
strings with the basic string length ps = db^. For example, the nucleotide 
string Si = tga---t^a (1-181) with L^j = 181 is formed by repeating the 
basic string tgat^ag with the length pi, where = 25pi-|-6. The nucleotide 
string T,2^t'^g--- tga (444403-444490) with = 88 is formed by repeating 
the basic string t^gct^ga with the length p2, where L^j = llp2- Although the 
transference of non-periodic nucleotide strings might also contribute to the 
periodic correlation structures, they are mainly formed by repeating the basic 
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periodic nucleotide strings. For example, the non-periodic nucleotide string 
E3 = atgtatg'^catg^catgtatg (84905-84926) with = 22 is transferred to 
the place (84929-94850) with the correlation distance 3db2- Moreover, for 
the quasi-periodic correlation structure, the non-periodic nucleotide string 
^4 = gtgagtaH^ctcgcai^cfictc (556196-556224) with = 29 is transferred 
to the places (556634-556662), (556748-556776), (556862-556906), (557300- 
557328), (557414-557442) and (557852-557880) with the correlation distances 
96-|-3(ib3, 96-|-4(i63, 96-f5cif,3, 78-|-9(i63, IS + lQdi,.^ and 60-|-14(if,3, respectively. 
So both the repetition of basic periodic nucleotide strings and the transfer- 
ence of non-periodic nucleotide strings would form the periodic correlation 
structures with approximately the same increasing periods in a short corre- 
lation distance, but only the transference of non-periodic nucleotide strings 
would form the quasi-periodic correlation structure in a long correlation dis- 
tance. 

Secondly, the synecho genome has two basic periodic lengths 6, 888 and 
a basic quasi-periodic length of 296. Fig. 5(b) shows the correlation inten- 
sity at different correlation distance with / = 15 for the synecho genome, 
with a local region magnified. It is evident that there exist some equidistant 
parallel lines with basic periodic lengths, to form periodic correlation struc- 
tures in short and long ranges of correlation distances, respectively. Two 
basic periodic lengths are determined as db^ = 6 and db^ = 888. Moreover, in 
Fig. 5(b), there also exist some approximately equidistant parallel lines at the 
positions di = 297 + Xidb^ ~ {l + 3xi)db3 and ^2 = 591 + X2db2 ~ {2 + 3x2)db3, 
where the basic quasi-periodic length db^ is 296 and xi,X2 — 0, 1, 2, • • •. They 
form quasi-periodic correlation structures in a long range of the correlation 
distance. In Table II, there exist some periodic nucleotide strings with in- 
teger multiples of db^ and the basic string length p2 = db2, but no periodic 
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nucleotide strings with the basic string length — d^^. For example, the 
nucleotide string Si = gag ■ ■ ■ tga (527703-527770) with = 68 is formed 
by repeating the basic string gagc^g^a^c^tga^c^ with the length pi = Sdf,^, 
where L^^ — Spi + 14. The nucleotide string E2 = cac ■ ■ ■ gH (2354010- 
2355833) with L^a = 1824 is formed by repeating the basic string cac ■ ■ ■ cat 
with the length p2, where — 2p2 + 48. Moreover, the nucleotide string 

53 = {ctga'^c^gagc^g^a^cj^ctga (527395-527434) with = 40 is transferred 
to the places (527473-527512), (527491-527530), (527509-527548), (527527- 
527566) and (527545-527584) with the correlation distances 134^, 164^, 
19db^, 22db^, and 25db^, respectively. The non-periodic nucleotide string 

54 = cac--- teg (2354010-2354300) with L^^ = 291 is transferred to the 
places (2356674-2356964), (2357562-2357852) and (2358450-2358740) with 
the correlation distances 3^62, 4:db^ and 5^62, respectively. Both the repeti- 
tion of basic periodic nucleotide strings and the transference of non-periodic 
nucleotide strings would form the periodic correlation structures with approx- 
imately the same increasing periods in short and long correlation distances, 
but only the transference of non-periodic nucleotide strings would form the 
quasi-periodic correlation structures in a long correlation distance. 

(5) The correlation distance contains almost no increasing periods. The 
genomes of Aquifex aeohcus VF5 (aquae), Rhizobium sp. NGR234 plasmid 
pNGR234a {pNGR234), Chlamydophila pneumoniae CWL029 (cpneu) and 
Chlamydia trachomatis D/UW-3/CX [ctra) are among cases with such char- 
acteristics. Consider the aquae genome as an example. Fig. 6 shows the 
correlation intensity at different correlation distances with I — 15 for the 
aquae genome. It is evident that there exist some equidistant parallel lines 
with a basic periodic length, which is determined as db = 67. However, for 
the basic periodic length db = 67, the maximal correlation intensity S((i) is 
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only 179 and the correlation structure has only three peaks of the correlation 
intensity at the positions db, ^d^ and Sd;,. The weak correlation intensity with 
a few peaks in the correlation distance may not make any periodic correlation 
structures. In Table II, there are also no periodic nucleotide strings for the 
almost non-periodic correlation structure. So the aquae genome almost has 
no periodic correlation structures. 

5 Conclusion and discussions 

In summary, using the metric representation and the recurrence plot method, 
we have observed periodic correlation structures in bacterial and archaeal 
complete genomes. All basic periodic lengths in the periodic correlation 
structures are determined. On the basis of the periodic correlation struc- 
tures, the bacterial and archaeal complete genomes, as classified into five 
groups, display four kinds of fundamental transferring characteristics: a sin- 
gle increasing period, several increasing periods, an increasing quasi-period 
and almost noincreasing period. The mechanism of the periodic correlation 
structures is further analyzed by determining all long periodic nucleotide 
strings in the bacterial and archaeal complete genomes and is explained as 
follows: both the repetition of basic periodic nucleotide strings and the trans- 
ference of non-periodic nucleotide strings would form the periodic correlation 
structures with approximately the same increasing periods. 

In comparison with the complete genome of the Saccharomyces cevevisiae 
yeast [32], it is found that the bacterial, archaeal and yeast complete genomes 
have the same four kinds of fundamental transferring characteristics of nu- 
cleotide strings. They choose preferably the basic periodic length db ~ 67 or 
its double db ~ 135 in the periodic correlation structures, even they do not 
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have basic string lengths of long periodic nucleotide strings, which are equal 
to the basic periodic lengths. The basic periodic length db ~ 135 was also 
found in the correlation analysis of the human genomes [TO l|. 

Although more and more biological functions of the junk DNA in cells are 
found, the mystery of transposable elements in the whole genomes remains 
unraveled. The purpose of this work is to depict the genome structure in the 
bacterial and archaeal complete genomes and explain the genome dynamics in 
terms of nucleotide string transfer. The proposed periodic correlation struc- 
tures with approximately the same increasing periods may have fundamental 
importance for the biological functions of the junk DNA. 
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Table I. Basic strings and their lengths of long periodic nucleotide strings (> 20 bases) 
in bacterial and archaeal complete genomes. Np is the number of the periods. AT, is the 
total muul)('r of ihr l)asic strings. 



No 


G(ni()m(>(l("iigtlij 




P(>ri<)(ls(l)asic strings) 


1 


m5en(580074) 


1/4 


3{tag, tgt, ct^, eta) 


2 


mjan(1664970) 


3/4 


l{g), 67(ai2 • • • i^c), 68(ai^ ■ • ■ atg,gt'^ ■ ■ ■ a^c) 


3 


/ipt//(1667867) 


8/14 


6{cta'^gt), 7{tgat^ag), 8{t'^ gct"^ ga, t^atgtat, tca^gcm^), 








12{ctc ■ ■ ■ ctc,aH ■■■t^), lQ{t'^a ■ ■ ■ tat, aH--- ata). 








21{t^ - - ■ t^c,tgt ■ - - t^c,tca ■ - - tea), 2A{t^c ■ - - t^c), 390(t^ • • • a^t) 


4 


/ij9t//99(1643831) 


10/19 


l{a),7{t^agtga), S{ca^c'^at'^ ,t'^atgtat,cat^ca^t), 9{t^gatga, 








f^caHc), 10{a'^gat'^a^c), 12{gt'^ - - - tgt, - - - ta^,tct - - - ate). 








15(a3 ■--t^,t^c--- gat,ca^ -■-ga^,a^--- t^c),lQ{a^ - - ■ t^c). 








21{atc - - - atc,tgt - - - t^c), 228{tcg - - - ct^) 


5 


66«r(910724) 


2/3 


m{tg'' - - - t'g), 162(5*2 • • • ct", gag--- ctg) 


6 


rpa;a;(1111523) 


1/1 


84{at'^ - - ■ aH) 


7 


/im/(1830138) 


6/12 


3{at^),A{gtct,t^ga,t'^g'^,tgac,a'^(?,t^gc,co?t), b{t^atc). 








9{cgcH''gt''), 12{cf - - - gag), lh{a^ - - - ag^) 


9 


mpne«(816394) 


1/3 


12{t'^a - - - cgc,tcg - - - agt, ct^ - - - gca) 


10 


mi/ie(1751377) 


3/3 


5{cagtc), 9{ct^ct^gta), AA{tat - - - ctg) 


14 


miu6(4411529) 


19/34 


9{t'^gtg^(?, a^cg^cg^c, g^cg^cac?), 15{tgc - - - cac), SO{gc'^ - - ■ gtc. 



gt^ - - - cag), 36{acg - - - acg, ga^ - - - gca), 51{tga - - - get), 
53{ctc - - - ctg, ctg - - - get, cat - - - a^c), 54:{agt - - -gc?, g^c - - - aca), 
bQ{gc^ - - - tgc, gcg--- cga, c^g--- ac^), 57{gtg - - - tg"^, eta--- get, 
gca - - - get, g(? - - - aca), ^8{g(? - - - cga), 59{agt - - - eta, gcg - - - eta, 
egc - - - cc?), &3{gag - - - agt), 69(5^0 ■ • • t(?), 75{age - - - gtc. 



e^g---eH), 77 {gat --- get), 78{gH - - - get), 79{gH- - -(?a), 
lll{gag---e^a), Qlf>{eg'' - - - eg"") 



18 


ecoZz(4639221) 


2/3 


8{atgaHg, gcactatg), 113{egc - - -t^a) 


19 


synec/io(3573470) 


7/9 


17{tat - - - tgc), I8{gag • • • c^), 30(5a5f • • ■ c^,gag - - - cH), 








42{tca - - - aH), 78{gag ---c^), 318{gat - - - at^), 888(cac • • • cat. 








o?c - - - cgt) 


22 


t39o/(1138011) 


2/2 


2A.{ctc---t^c), 93{gct - - - ga"^) 
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Table II. Basic periodic lengths of periodic correlation structures and basic string 
lengths of long periodic nucleotide strings (> 20 bases) in bacterial and archaeal complete 

genomes (k>not,<xl di, and ji. r(>si)("(i i\"<'l\". 



No. 


(i('lU)lll<'(AcC. No.) 


<h 


P 


1 


mgen{LA3967) 


3 


3 


2 


mjan{L77117) 


67 


1, 67, 68 


3 


hpyl{AE0005n) 


7,8, w 114 


6, 7, 8, 12, 16, 21, 24, 390 


4 


hpym{AE00U39) 


8,15,21 


1, 7, 8, 9, 10, 12, 15, 16, 21, 228 


5 


bbur{AE000783) 


162 


60, 162 


6 


rpxx{AJ235269) 


84 


84 


7 


/im/(L42023) 


4 


3, 4, 5, 9, 12, 15 


8 


pA/'Gi?234(C/00090) 


(6) 


- 


9 


mpneu(?700089) 


12 


12 


10 


mthe{AE000666) 


67 


5, 9, 44 


11 


aquae{AE000657) 


(67) 


- 


12 


pyro{B AOOOOOl) 


67 


- 


13 


aful{AE000782) 


67 


- 


14 


mtub{AL123A56) 


9,15,(57) 


9, 15, 30, 36, 51, 53, 54, 56, 57, 58 
59, 63, 69, 75, 77, 78, 79, 111, 615 


15 


pabyssi{AL096836) 


67 




16 


tmar{AE000512) 


67 




17 


cpneu{AE001363) 


(330) 




18 


ecoli{U00096) 


100,113 


8,113 


19 


synecho{B A000022) 


6,888, « 296 


17, 18, 30, 42, 78, 318, 888 


20 


ctra{AE001273) 


(108), (150) 




21 


aero(BA000002) 


65 




22 


tpal{AE000520) 


24 


24, 93 


23 


bsub{AL009126) 


f« 5000 
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Figure caption 

Fig. 1. Plots of correlation intensity versus correlation distance d 

for (a) periodic and (b) random sequences. 

Fig. 2. A plot of correlation intensity H(o?) versus correlation distance d 
for the Methanococcus jannaschii DSM 2661 (mjan) genome. 

Fig. 3. A plot of correlation intensity E{d) versus correlation distance d 
for the Helicobacter pylori J99 {hpyl99) genome. 

Fig. 4. A plot of correlation intensity versus correlation distance d 
for the Bacillus subtilis subsp. subtilis str. 168 {hsuh) genome. 

Fig. 5. Plots of correlation intensity S(d) versus correlation distance d 
for (a) the Hehcobacter pylori 26695 {hpyl) and (b) the Synechocystis sp. 
PCC6803 (synecho) genomes. 

Fig. 6. A plot of correlation intensity S((i) versus correlation distance d 
for the Aquifex aeolicus VF5 (aquae) genome. 
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